Goto

Collaborating Authors

 discriminator network


DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis

Ahmad, Zeeshan, Bao, Shudi, Chen, Meng

arXiv.org Artificial Intelligence

In recent years, generative adversarial networks (GANs) have made significant progress in generating audio sequences. However, these models typically rely on bandwidth-limited mel-spectrograms, which constrain the resolution of generated audio sequences, and lead to mode collapse during conditional generation. To address this issue, we propose Deformable Periodic Network based GAN (DPN-GAN), a novel GAN architecture that incorporates a kernel-based periodic ReLU activation function to induce periodic bias in audio generation. This innovative approach enhances the model's ability to capture and reproduce intricate audio patterns. In particular, our proposed model features a DPN module for multi-resolution generation utilizing deformable convolution operations, allowing for adaptive receptive fields that improve the quality and fidelity of the synthetic audio. Additionally, we enhance the discriminator network using deformable convolution to better distinguish between real and generated samples, further refining the audio quality. We trained two versions of the model: DPN-GAN small (38.67M parameters) and DPN-GAN large (124M parameters). For evaluation, we use five different datasets, covering both speech synthesis and music generation tasks, to demonstrate the efficiency of the DPN-GAN. The experimental results demonstrate that DPN-GAN delivers superior performance on both out-of-distribution and noisy data, showcasing its robustness and adaptability. Trained across various datasets, DPN-GAN outperforms state-of-the-art GAN architectures on standard evaluation metrics, and exhibits increased robustness in synthesized audio.


Determination of galaxy photometric redshifts using Conditional Generative Adversarial Networks (CGANs)

Garcia-Fernandez, M.

arXiv.org Artificial Intelligence

Accurate and reliable photometric redshifts determination is one of the key aspects for wide-field photometric surveys. Determination of photometric redshift for galaxies, has been traditionally solved by use of machine-learning and artificial intelligence techniques trained on a calibration sample of galaxies, where both photometry and spectrometry are determined. On this paper, we present a new algorithmic approach for determining photometric redshifts of galaxies using Conditional Generative Adversarial Networks (CGANs). Proposed CGAN implementation, approaches photometric redshift determination as a probabilistic regression, where instead of determining a single value for the estimated redshift of the galaxy, a full probability density is computed. The methodology proposed, is tested with data from Dark Energy Survey (DES) Y1 data and compared with other existing algorithm such as a Random Forest regressor.


GAN-Based Architecture for Low-dose Computed Tomography Imaging Denoising

Wang, Yunuo, Yang, Ningning, Li, Jialin

arXiv.org Artificial Intelligence

Generative Adversarial Networks (GANs) have surfaced as a revolutionary element within the domain of low-dose computed tomography (LDCT) imaging, providing an advanced resolution to the enduring issue of reconciling radiation exposure with image quality. This comprehensive review synthesizes the rapid advancements in GAN-based LDCT denoising techniques, examining the evolution from foundational architectures to state-of-the-art models incorporating advanced features such as anatomical priors, perceptual loss functions, and innovative regularization strategies. We critically analyze various GAN architectures, including conditional GANs (cGANs), CycleGANs, and Super-Resolution GANs (SRGANs), elucidating their unique strengths and limitations in the context of LDCT denoising. The evaluation provides both qualitative and quantitative results related to the improvements in performance in benchmark and clinical datasets with metrics such as PSNR, SSIM, and LPIPS. After highlighting the positive results, we discuss some of the challenges preventing a wider clinical use, including the interpretability of the images generated by GANs, synthetic artifacts, and the need for clinically relevant metrics. The review concludes by highlighting the essential significance of GAN-based methodologies in the progression of precision medicine via tailored LDCT denoising models, underlining the transformative possibilities presented by artificial intelligence within contemporary radiological practice.


Reviews: IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis

Neural Information Processing Systems

Update: I raised my score by two points because the rebuttal and reviews/comments revealed more differences that I originally noticed with respect to the AGE work, in particular in terms of the use of the KL divergence as a discriminator per example, and because the authors promised to discuss the connection to AGE and potentially expand the experimental section. I remain concerned that the resulting model is not a variational auto-encoder anymore despite the naming of the model (but rather closer to a GAN where the discriminator is based on the KL divergence), and about the experimental section, which reveals that the method works well, but does not provide a rich analysis for the proposed improvements. Rather than using a separate discriminator network, the work proposes a learning objective which encourages the encoder to discriminate between real data and generated data: it guides the approximate posterior to be close to the prior in the real data case and far from the prior otherwise. The approach is illustrated on the task of synthesizing high-resolution images, trained on the CelebA-HQ dataset. First, high-quality image generation remains an important area of research, and as a result, the paper's topic is relevant to the community.


Reviews: Triple Generative Adversarial Nets

Neural Information Processing Systems

In this paper, the authors propose a new formulation of adversarial networks for image generation, that incorporates three networks instead of the usual generator G and discriminator D. In addition, they include a classifier C, which cooperates with G to learn a compatible joint distribution (X,Y) over images and labels. The authors show how this formulation overcomes pitfalls of previous class-conditional GANs; namely that class-conditional generator and discriminator networks have competing objectives that may prevent them from learning the true distribution and preventing G from accurately generating class-conditional samples. The authors identify the following deficiency in class-conditional GAN setups: "The competition between G and D essentially arises from their two-player formulation, where a single discriminator network has to play two incompatible roles--identifying fake samples and predicting labels". The argument goes that if G were perfect, then a class-conditional D has an equal incentive to output 0 since the sample comes from G, and to output 1 since the image matches the label. This might force D to systematically underperform as a classifier, and therefore prevent G from learning to produce accurate class-conditional samples.


Image inpainting for corrupted images by using the semi-super resolution GAN

Momen-Tayefeh, Mehrshad, Momen-Tayefeh, Mehrdad, Ghahramani, Amir Ali Ghafourian

arXiv.org Artificial Intelligence

Image inpainting is a valuable technique for enhancing images that have been corrupted. The primary challenge in this research revolves around the extent of corruption in the input image that the deep learning model must restore. To address this challenge, we introduce a Generative Adversarial Network (GAN) for learning and replicating the missing pixels. Additionally, we have developed a distinct variant of the Super-Resolution GAN (SRGAN), which we refer to as the Semi-SRGAN (SSRGAN). Furthermore, we leveraged three diverse datasets to assess the robustness and accuracy of our proposed model. Our training process involves varying levels of pixel corruption to attain optimal accuracy and generate high-quality images.


Cooperative Edge Caching Based on Elastic Federated and Multi-Agent Deep Reinforcement Learning in Next-Generation Network

Wu, Qiong, Wang, Wenhua, Fan, Pingyi, Fan, Qiang, Zhu, Huiling, Letaief, Khaled B.

arXiv.org Artificial Intelligence

Edge caching is a promising solution for next-generation networks by empowering caching units in small-cell base stations (SBSs), which allows user equipments (UEs) to fetch users' requested contents that have been pre-cached in SBSs. It is crucial for SBSs to predict accurate popular contents through learning while protecting users' personal information. Traditional federated learning (FL) can protect users' privacy but the data discrepancies among UEs can lead to a degradation in model quality. Therefore, it is necessary to train personalized local models for each UE to predict popular contents accurately. In addition, the cached contents can be shared among adjacent SBSs in next-generation networks, thus caching predicted popular contents in different SBSs may affect the cost to fetch contents. Hence, it is critical to determine where the popular contents are cached cooperatively. To address these issues, we propose a cooperative edge caching scheme based on elastic federated and multi-agent deep reinforcement learning (CEFMR) to optimize the cost in the network. We first propose an elastic FL algorithm to train the personalized model for each UE, where adversarial autoencoder (AAE) model is adopted for training to improve the prediction accuracy, then {a popular} content prediction algorithm is proposed to predict the popular contents for each SBS based on the trained AAE model. Finally, we propose a multi-agent deep reinforcement learning (MADRL) based algorithm to decide where the predicted popular contents are collaboratively cached among SBSs. Our experimental results demonstrate the superiority of our proposed scheme to existing baseline caching schemes.


Utilizing generative adversarial networks for stable structure generation in Angry Birds

AIHub

The popular physics-based puzzle game series Angry Birds has been played and enjoyed by millions of people since its original launch in 2009. However, while the game may seem somewhat simple and straightforward to play, with even very young children being able to quickly grasp its mechanics and strategies, artificial intelligence has so far failed to obtain human-level performance. Along with a lack of knowledge about the game's internal physics engine and imprecise object detection algorithms, one of the core challenges to training better game-playing agents is the limited number and variety of available game levels. The levels in Angry Birds often contain individual structures that are made up of multiple rectangular 2D blocks, such as those shown in figure 1. While a handful of previous structure generators for Angry Birds exist, they often rely on hard-coded design constraints that limit the output diversity.


Explainable unsupervised multi-modal image registration using deep networks

Wang, Chengjia, Papanastasiou, Giorgos

arXiv.org Artificial Intelligence

Clinical decision making from magnetic resonance imaging (MRI) combines complementary information from multiple MRI sequences (defined as 'modalities'). MRI image registration aims to geometrically 'pair' diagnoses from different modalities, time points and slices. Both intra- and inter-modality MRI registration are essential components in clinical MRI settings. Further, an MRI image processing pipeline that can address both afine and non-rigid registration is critical, as both types of deformations may be occuring in real MRI data scenarios. Unlike image classification, explainability is not commonly addressed in image registration deep learning (DL) methods, as it is challenging to interpet model-data behaviours against transformation fields. To properly address this, we incorporate Grad-CAM-based explainability frameworks in each major component of our unsupervised multi-modal and multi-organ image registration DL methodology. We previously demonstrated that we were able to reach superior performance (against the current standard Syn method). In this work, we show that our DL model becomes fully explainable, setting the framework to generalise our approach on further medical imaging data.


State-Conditioned Adversarial Subgoal Generation

Wang, Vivienne Huiling, Pajarinen, Joni, Wang, Tinghuai, Kämäräinen, Joni-Kristian

arXiv.org Artificial Intelligence

Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks by performing decision-making and control at successively higher levels of temporal abstraction. However, off-policy HRL often suffers from the problem of a non-stationary high-level policy since the low-level policy is constantly changing. In this paper, we propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy. In practice, the adversarial learning is implemented by training a simple state-conditioned discriminator network concurrently with the high-level policy which determines the compatibility level of subgoals. Comparison to state-of-the-art algorithms shows that our approach improves both learning efficiency and performance in challenging continuous control tasks.